Take Home Portion:


Consider the data in the file: FinalExamData.csv.
This file contains a column Xt and a column Zt.Your goal is to simply model this data using a vector autoregressive model (VAR) and a multi-layer perceptron model (MLP). You ultimately want to forecast Xt with a horizon of 10 using Zt if it is useful.

1 Load Data

3 Cross Correlation

Answer.
Below Plot shows Xt and Zt are correlated at lag 5. Correlation at lag 5 is 0.861.

## 
## Autocorrelations of series 'X', by lag
## 
##    -16    -15    -14    -13    -12    -11    -10     -9     -8     -7     -6 
##  0.014  0.173 -0.223  0.078 -0.134  0.026  0.126 -0.056 -0.055  0.032 -0.270 
##     -5     -4     -3     -2     -1      0      1      2      3      4      5 
##  0.861 -0.191  0.022 -0.118 -0.118  0.103 -0.003 -0.079  0.063 -0.170  0.156 
##      6      7      8      9     10     11     12     13     14     15     16 
##  0.098 -0.155  0.097 -0.005  0.128  0.023  0.014 -0.024 -0.013 -0.192  0.060

4 Split into Train/Test Set

Question b.i

Provide the code you used to fit the models and make the predictions. (If you find the Zt’s useful, you will need to forecast them to use in making predictions… a very simple univariate model will do.)

Question b.ii.

Find the ASE for each model using the last 10 observations of the Xt dataset. Include your code for this as well.

5 Vector Autoregressive Model

Here is the code to build VAR Model

5.1 VAR Model

## $selection
## AIC(n)  HQ(n)  SC(n) FPE(n) 
##      7      7      7      7 
## 
## $criteria
##                 1          2          3          4         5         6
## AIC(n)   5.073068   5.136325   5.150271   5.004156  3.072408  2.922366
## HQ(n)    5.171771   5.284380   5.347678   5.250915  3.368520  3.267829
## SC(n)    5.320266   5.507123   5.644668   5.622153  3.814005  3.787562
## FPE(n) 159.695722 170.206101 172.759399 149.508116 21.713955 18.750579
##                7         8         9        10
## AIC(n)  2.462031  2.499070  2.573375  2.562035
## HQ(n)   2.856846  2.943237  3.066894  3.104906
## SC(n)   3.450826  3.611464  3.809369  3.921628
## FPE(n) 11.885771 12.405831 13.460990 13.430655
## 
## VAR Estimation Results:
## ========================= 
## Endogenous variables: Xt, Zt 
## Deterministic variables: both 
## Sample size: 78 
## Log Likelihood: -284.373 
## Roots of the characteristic polynomial:
## 0.9497 0.9497 0.9031 0.9031 0.9022 0.9022 0.8767 0.8767 0.8739 0.8739 0.803 0.803 0.7598 0.7598
## Call:
## VAR(y = X, p = 7, type = "both")
## 
## 
## Estimation results for equation Xt: 
## =================================== 
## Xt = Xt.l1 + Zt.l1 + Xt.l2 + Zt.l2 + Xt.l3 + Zt.l3 + Xt.l4 + Zt.l4 + Xt.l5 + Zt.l5 + Xt.l6 + Zt.l6 + Xt.l7 + Zt.l7 + const + trend 
## 
##        Estimate Std. Error t value Pr(>|t|)    
## Xt.l1  0.594922   0.099826   5.960 1.30e-07 ***
## Zt.l1 -0.045087   0.107417  -0.420   0.6761    
## Xt.l2 -0.624099   0.096548  -6.464 1.80e-08 ***
## Zt.l2 -0.036826   0.112595  -0.327   0.7447    
## Xt.l3 -0.014268   0.033738  -0.423   0.6738    
## Zt.l3  0.050674   0.110755   0.458   0.6489    
## Xt.l4  0.022832   0.033174   0.688   0.4939    
## Zt.l4  0.202555   0.112824   1.795   0.0775 .  
## Xt.l5  0.049600   0.031658   1.567   0.1223    
## Zt.l5  3.035013   0.115861  26.195  < 2e-16 ***
## Xt.l6  0.057014   0.032482   1.755   0.0842 .  
## Zt.l6 -1.681839   0.320709  -5.244 2.01e-06 ***
## Xt.l7  0.012380   0.032745   0.378   0.7067    
## Zt.l7  2.035433   0.316711   6.427 2.09e-08 ***
## const -8.512893   6.029131  -1.412   0.1630    
## trend -0.002335   0.009031  -0.259   0.7968    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## 
## Residual standard error: 1.559 on 62 degrees of freedom
## Multiple R-Squared: 0.9539,  Adjusted R-squared: 0.9428 
## F-statistic: 85.57 on 15 and 62 DF,  p-value: < 2.2e-16 
## 
## 
## Estimation results for equation Zt: 
## =================================== 
## Zt = Xt.l1 + Zt.l1 + Xt.l2 + Zt.l2 + Xt.l3 + Zt.l3 + Xt.l4 + Zt.l4 + Xt.l5 + Zt.l5 + Xt.l6 + Zt.l6 + Xt.l7 + Zt.l7 + const + trend 
## 
##        Estimate Std. Error t value Pr(>|t|)   
## Xt.l1  0.002723   0.116698   0.023  0.98146   
## Zt.l1 -0.328601   0.125573  -2.617  0.01114 * 
## Xt.l2  0.116790   0.112867   1.035  0.30480   
## Zt.l2 -0.116573   0.131626  -0.886  0.37923   
## Xt.l3 -0.006358   0.039440  -0.161  0.87246   
## Zt.l3 -0.198717   0.129475  -1.535  0.12992   
## Xt.l4 -0.069321   0.038781  -1.787  0.07875 . 
## Zt.l4 -0.129934   0.131894  -0.985  0.32838   
## Xt.l5  0.016957   0.037008   0.458  0.64842   
## Zt.l5 -0.069548   0.135444  -0.513  0.60944   
## Xt.l6  0.022519   0.037972   0.593  0.55531   
## Zt.l6 -0.131994   0.374916  -0.352  0.72599   
## Xt.l7 -0.067592   0.038279  -1.766  0.08236 . 
## Zt.l7 -0.545246   0.370243  -1.473  0.14590   
## const 24.019154   7.048193   3.408  0.00116 **
## trend  0.008416   0.010557   0.797  0.42842   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## 
## Residual standard error: 1.822 on 62 degrees of freedom
## Multiple R-Squared: 0.2855,  Adjusted R-squared: 0.1127 
## F-statistic: 1.652 on 15 and 62 DF,  p-value: 0.08578 
## 
## 
## 
## Covariance matrix of residuals:
##         Xt      Zt
## Xt  2.4296 -0.3202
## Zt -0.3202  3.3203
## 
## Correlation matrix of residuals:
##         Xt      Zt
## Xt  1.0000 -0.1127
## Zt -0.1127  1.0000

5.2 VAR Forecast

This is VAR Forecast with Fanchart. Fanchart shows Xt is forecasts for both variables accounting correlation between them. Xt is much is much better than Zt.

This is VAR ASE

5.3 VAR ASE

## [1] 6.812107

6 Multi-layer perceptron model

Here is the code to build MLP Model

## MLP fit with 5 hidden nodes and 100 repetitions.
## Univariate lags: (1,2,3,6)
## 1 regressor included.
## - Regressor 1 lags: (4,5,6,7,8,10)
## Forecast combined using the mean operator.
## MSE: 0.3723.

6.0.1 MLP Forecast

This is MLP forecast.

6.0.2 ASE

This is MLP ASE.

## [1] 1.299115

7 Ensemble

7.0.1 Ensemble Forecast

Ensemble is average of forecast from both VAR and MLP models.

7.0.2 Ensemble ASE

This is ensemble ASE.

## [1] 3.083184

8 Summary

Model ASE
VAR 6.81
MLP 1.15
Ensemble 2.95

9 Which Model?

Question d) :

Make a quick statement about which of the 3 models you feel is most useful and why.

Answer :

VAR ASE is 6.81 and MLP is 1.15. MLP looks great with lowest ASE. But I would go ahead with Ensemble as it is taking average from both. Both methods use different approach to build model and average prediction from them likely to reduce generalization coming from individual models.

10 Future forecast

Question e) :

Using the model you thought was most useful, provide a plot of the 10 forecasts (time points 96 – 105) as well a list of the 10 forecasts. You do not need prediction intervals for this question.

Since Ensemble has been selected we need to take average from both models for future 10 predictions as follows.

  • VAR model forecasts both \(X_{t}\) and \(Z_{t}\)
  • Use \(Z_{t}\) for duration 96-105 in both models to make future predictions

get the forecast for 96-105 using VAR model. We get both \(X_{t}\) and \(Z_{t}\)

##           fcst    lower    upper       CI
##  [1,] 30.41449 17.60230 43.22669 12.81219
##  [2,] 28.88744 16.02330 41.75159 12.86414
##  [3,] 31.01826 18.05716 43.97936 12.96110
##  [4,] 29.21632 16.25450 42.17815 12.96182
##  [5,] 29.98873 16.80968 43.16779 13.17905
##  [6,] 30.68726 17.21733 44.15719 13.46993
##  [7,] 28.65271 15.18029 42.12513 13.47242
##  [8,] 30.16329 16.57829 43.74829 13.58500
##  [9,] 30.29818 16.66751 43.92885 13.63067
## [10,] 30.15519 16.48236 43.82802 13.67283
##            fcst    lower    upper       CI
##  [1,] 10.342489 6.314470 14.37051 4.028019
##  [2,]  9.649623 5.621514 13.67773 4.028109
##  [3,] 10.100107 6.028596 14.17162 4.071511
##  [4,] 10.143779 6.046612 14.24095 4.097167
##  [5,] 10.111416 6.008520 14.21431 4.102896
##  [6,]  9.953890 5.847267 14.06051 4.106623
##  [7,] 10.104139 5.996575 14.21170 4.107565
##  [8,] 10.029459 5.916788 14.14213 4.112672
##  [9,] 10.224849 6.105721 14.34398 4.119128
## [10,]  9.889032 5.768013 14.01005 4.121019

get the forecast for 96-105 using MLP model. We get both \(X_{t}\) and \(Z_{t}\)

##     t+11     t+12     t+13     t+14     t+15     t+16     t+17     t+18 
## 24.05010 24.14142 17.00253 31.03424 38.01539 30.82702 28.56353 28.27106 
##     t+19     t+20 
## 28.53784 31.88707

forecast of 96:105 using Ensemble

11 Rolling ASE

11.2 Rolling window ASEs

## [1] 48.22527
## [1] 52.21166
## [1] 49.24604

11.3 Rolling Window Plot

12 Conclusion

Rolling window ASE Suggests that VAR and ensemble are close enough. Neural network has higher ASE. I will still go ahead with Ensemble.